104 research outputs found

    Semi-Automatic Identification of Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences

    Get PDF

    Extending Word-Level Quality Estimation for Post-Editing Assistance

    Full text link
    We define a novel concept called extended word alignment in order to improve post-editing assistance efficiency. Based on extended word alignment, we further propose a novel task called refined word-level QE that outputs refined tags and word-level correspondences. Compared to original word-level QE, the new task is able to directly point out editing operations, thus improves efficiency. To extract extended word alignment, we adopt a supervised method based on mBERT. To solve refined word-level QE, we firstly predict original QE tags by training a regression model for sequence tagging based on mBERT and XLM-R. Then, we refine original word tags with extended word alignment. In addition, we extract source-gap correspondences, meanwhile, obtaining gap tags. Experiments on two language pairs show the feasibility of our method and give us inspirations for further improvement

    Identifying and Utilizing the Class of Monosemous Japanese Functional Expressions in Machine Translation

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Analysing features of Japanese splogs and characteristics of keywords

    Full text link
    This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of key-words contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually exam-ine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various infor-mative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers

    Open-source Software for Developing Anthropomorphic Spoken Dialog Agents

    Get PDF
    An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module is modeled as a virtual machine that has a simple common interface and is connected to each other through a broker (communication manager). The agent system under development is supported by the IPA and it will be publicly available as a software toolkit this year
    corecore